We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in turn, has a major impact on CV systems in the form of the Semantic Gap Problem (SGP). The paper, while extensively exemplifying the lack of coincidence as above, introduces a general, domain-agnostic methodology to enforce alignment between visual and lexical semantics.
translated by 谷歌翻译
降解的图像通常存在于字符图像的一般来源中,从而导致特征识别结果不令人满意。现有的方法有专门的努力来恢复降级的角色图像。但是,这些方法获得的降解结果似乎并不能提高字符识别性能。这主要是因为当前方法仅着眼于像素级信息,而忽略了角色的关键特征,例如其字形,从而在脱索过程中导致字符标志性损害。在本文中,我们介绍了一个基于字形融合和注意力机制(即Churformer)的新型通用框架,以精确地恢复角色图像而不改变其固有的字形。与现有的框架不同,Charformer引入了一个并行目标任务,用于捕获其他信息并将其注入DICONISE骨架的图像,这将在字符图像DeNoising期间保持角色字形的一致性。此外,我们利用基于注意力的网络进行全局本地特征交互,这将有助于处理盲目的denoising和增强deNoSising绩效。我们将Charformer与多个数据集上的最新方法进行比较。实验结果表明了杂形和质量上的优势。
translated by 谷歌翻译
构建高质量的角色图像数据集很具有挑战性,因为现实世界图像通常受图像退化的影响。将当前图像恢复方法应用于此类现实世界字符图像时存在局限性,因为(i)字符图像中的噪声类别与一般图像中的噪声类别不同; (ii)现实世界字符图像通常包含更复杂的图像降解,例如不同噪声水平的混合噪声。为了解决这些问题,我们提出了一个现实世界角色恢复网络(RCRN),以有效恢复降级的角色图像,其中使用字符骨架信息和比例安装特征提取来获得更好的恢复性能。所提出的方法由骨架提取器(SENET)和角色图像修复器(CIRNET)组成。 Senet旨在保持角色的结构一致性并使复杂的噪声正常化。然后,Cirnet从降级的角色图像及其骨骼中重建了清洁图像。由于缺乏现实世界字符图像恢复的基准,我们构建了一个包含1,606个字符图像的数据集,这些图像具有现实世界中的降级,以评估所提出方法的有效性。实验结果表明,RCRN在定量和质量上优于最先进的方法。
translated by 谷歌翻译
长尾效应是一个常见的问题,它限制了对现实世界数据集中深度学习模型的性能。由于字符使用频率差异,角色图像数据集的开发还受到这种不平衡数据分布的影响。因此,当当前的角色识别方法应用于现实世界数据集时,尤其是尾巴中缺少训练样本的字符类别,例如不常见的字符或历史文档中的字符。在本文中,我们通过自由基提取(即REZCR)提出一个零摄像的角色识别框架,以提高几个样本字符类别的识别性能,在其中我们通过分解和分解和分解和分解和分解和分解字符的图形单位来利用有关的信息重建拼字法之后的字符。 REZCR由基于注意力的激进信息提取器(RIE)和基于知识图的角色推理器(KGR)组成。 RIE的目的是认识到候选激进分子及其从角色图像中可能的结构关系。结果将被馈入KGR,以通过使用预设计的字符知识图来识别目标字符。我们在多个数据集上验证我们的方法,REZCR显示出有希望的实验结果,尤其是对于少数样本字符数据集。
translated by 谷歌翻译
Molecular representation learning is crucial for the problem of molecular property prediction, where graph neural networks (GNNs) serve as an effective solution due to their structure modeling capabilities. Since labeled data is often scarce and expensive to obtain, it is a great challenge for GNNs to generalize in the extensive molecular space. Recently, the training paradigm of "pre-train, fine-tune" has been leveraged to improve the generalization capabilities of GNNs. It uses self-supervised information to pre-train the GNN, and then performs fine-tuning to optimize the downstream task with just a few labels. However, pre-training does not always yield statistically significant improvement, especially for self-supervised learning with random structural masking. In fact, the molecular structure is characterized by motif subgraphs, which are frequently occurring and influence molecular properties. To leverage the task-related motifs, we propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT). MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. The prompt effectively augments the molecular graph with meaningful motifs in the continuous representation space; this provides more structural patterns to aid the downstream classifier in identifying molecular properties. Extensive experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction, with or without a few fine-tuning steps.
translated by 谷歌翻译
Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in l1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the l1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored. To this end, we first thoroughly analyze the difference on distributions of weights and activations in AdderNet and then propose a new quantization algorithm by redistributing the weights and the activations. Specifically, the pre-trained full-precision weights in different kernels are clustered into different groups, then the intra-group sharing and inter-group independent scales can be adopted. To further compensate the accuracy drop caused by the distribution difference, we then develop a lossless range clamp scheme for weights and a simple yet effective outliers clamp strategy for activations. Thus, the functionality of full-precision weights and the representation ability of full-precision activations can be fully preserved. The effectiveness of the proposed quantization method for AdderNet is well verified on several benchmarks, e.g., our 4-bit post-training quantized adder ResNet-18 achieves an 66.5% top-1 accuracy on the ImageNet with comparable energy efficiency, which is about 8.5% higher than that of the previous AdderNet quantization methods.
translated by 谷歌翻译
In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our toolkit is over $6\times$ times less for BERT Base and $9\times$ times less for BERT Large when compared with the original BERT paper. The documentation and code are released at https://github.com/extreme-bert/extreme-bert under the Apache-2.0 license.
translated by 谷歌翻译
In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.
translated by 谷歌翻译
The increasing reliance on online communities for healthcare information by patients and caregivers has led to the increase in the spread of misinformation, or subjective, anecdotal and inaccurate or non-specific recommendations, which, if acted on, could cause serious harm to the patients. Hence, there is an urgent need to connect users with accurate and tailored health information in a timely manner to prevent such harm. This paper proposes an innovative approach to suggesting reliable information to participants in online communities as they move through different stages in their disease or treatment. We hypothesize that patients with similar histories of disease progression or course of treatment would have similar information needs at comparable stages. Specifically, we pose the problem of predicting topic tags or keywords that describe the future information needs of users based on their profiles, traces of their online interactions within the community (past posts, replies) and the profiles and traces of online interactions of other users with similar profiles and similar traces of past interaction with the target users. The result is a variant of the collaborative information filtering or recommendation system tailored to the needs of users of online health communities. We report results of our experiments on an expert curated data set which demonstrate the superiority of the proposed approach over the state of the art baselines with respect to accurate and timely prediction of topic tags (and hence information sources of interest).
translated by 谷歌翻译
人群顺序注释可能是一种有效且具有成本效益的方式,用于构建用于序列标签的大型数据集。不同于标记独立实例,对于人群顺序注释,标签序列的质量取决于注释者在捕获序列中每个令牌的内部依赖性方面的专业知识水平。在本文中,我们提出了与人群(SA-SLC)进行序列标记的序列注释。首先,开发了有条件的概率模型,以共同模拟顺序数据和注释者的专业知识,其中引入分类分布以估计每个注释者在捕获局部和非本地标签依赖性以进行顺序注释时的可靠性。为了加速所提出模型的边缘化,提出了有效的标签序列推理(VLSE)方法,以从人群顺序注释中得出有效的地面真相标签序列。 VLSE从令牌级别中得出了可能的地面真相标签,并在标签序列解码的正向推断中进一步介绍了李子标签。 VLSE减少了候选标签序列的数量,并提高了可能的地面真实标签序列的质量。自然语言处理的几个序列标记任务的实验结果显示了所提出的模型的有效性。
translated by 谷歌翻译